For each of the following series, make a graph of the data. If transforming seems appropriate, do so and describe the effect.
- United States GDP from
global_economy- Slaughter of Victorian “Bulls, bullocks and steers” in
aus_livestock- Victorian Electricity Demand from
vic_elec.- Gas production from
aus_production
us_economy <- global_economy |>
filter(Country == "United States")
us_economy |>
autoplot(GDP)
us_economy |>
autoplot(box_cox(GDP, 0))
us_economy |>
autoplot(box_cox(GDP, 0.3))
Let’s see what guerrero’s method suggests.
us_economy |>
features(GDP, features = guerrero)
## # A tibble: 1 × 2
## Country lambda_guerrero
## <fct> <dbl>
## 1 United States 0.282
Pretty close to \(\lambda = 0.3\), let’s see how it looks:
us_economy |>
autoplot(box_cox(GDP, 0.2819714))
vic_bulls <- aus_livestock |>
filter(State == "Victoria", Animal == "Bulls, bullocks and steers")
vic_bulls |>
autoplot(Count)
vic_bulls |>
autoplot(log(Count))
vic_bulls |>
features(Count, features = guerrero)
## # A tibble: 1 × 3
## Animal State lambda_guerrero
## <fct> <fct> <dbl>
## 1 Bulls, bullocks and steers Victoria -0.0446
vic_elec |>
autoplot(Demand)
Seasonal patterns for time of day hidden due to density of ink.
Day-of-week seasonality just visible.
Time-of-year seasonality is clear with increasing variance in winter and high skewness in summer.
vic_elec |>
autoplot(box_cox(Demand, 0))
A log transformation makes the variance more even and reduces the skewness.
Guerrero’s method doesn’t work here as there are several types of seasonality.
aus_production |>
autoplot(Gas)
aus_production |>
autoplot(box_cox(Gas, 0))
aus_production |>
features(Gas, features = guerrero)
## # A tibble: 1 × 1
## lambda_guerrero
## <dbl>
## 1 0.110
aus_production |>
autoplot(box_cox(Gas, 0.1095))
Looking good! The variation is now constant across the series.
Why is a Box-Cox transformation unhelpful for the
canadian_gasdata?
canadian_gas |>
autoplot(Volume) +
labs(
x = "Year", y = "Gas production (billion cubic meters)",
title = "Monthly Canadian gas production"
)
Here the variation in the series is not proportional to the amount of gas production in Canada.
When small and large amounts of gas is being produced, we can observe small variation in the seasonal pattern.
However, between 1975 and 1990 the gas production is moderate, and the variation is large.
Power transformations (like the Box-Cox transformation) require the variability of the series to vary proportionately to the level of the series.
For the following series, find an appropriate Box-Cox transformation in order to stabilise the variance. Tobacco from
aus_production, Economy class passengers between Melbourne and Sydney fromansett, and Pedestrian counts at Southern Cross Station frompedestrian.
aus_production |>
autoplot(Tobacco)
aus_production |>
autoplot(log(Tobacco))
aus_production |>
features(Tobacco, features = guerrero)
## # A tibble: 1 × 1
## lambda_guerrero
## <dbl>
## 1 0.926
aus_production |>
autoplot(box_cox(Tobacco, 0.926))
ansett |>
filter(Airports == "MEL-SYD", Class == "Economy") |>
autoplot(Passengers) +
labs(title = "Economy passengers", subtitle = "MEL-SYD")
The data does not appear to vary proportionally to the level of the series.
There are many periods in this time series (such as the strike and change in seat classes) that may need further attention, but this is probably better resolved with modelling rather than transformations.
pedestrian |>
filter(Sensor == "Southern Cross Station") |>
autoplot(Count) +
labs(title = "Southern Cross Pedestrians")
log(x+1) transformation:pedestrian |>
filter(Sensor == "Southern Cross Station") |>
autoplot(log1p(Count)) +
labs(title = "Southern Cross Pedestrians")
That’s roughly balanced the two tails.
Consider the last five years of the Gas data from
aus_production.
gas <- tail(aus_production, 5*4) |> select(Gas)
- Plot the time series. Can you identify seasonal fluctuations and/or a trend-cycle?
gas <- tail(aus_production, 5 * 4) |> select(Gas)
gas |>
autoplot(Gas) + labs(y = "Petajoules")
There is some strong seasonality and a trend.
- Use
classical_decompositionwithtype=multiplicativeto calculate the trend-cycle and seasonal indices.- Do the results support the graphical interpretation from part a?
decomp <- gas |>
model(decomp = classical_decomposition(Gas, type = "multiplicative")) |>
components()
decomp |> autoplot()
The decomposition has captured the seasonality and a slight trend.
- Compute and plot the seasonally adjusted data.
as_tsibble(decomp) |>
autoplot(season_adjust) +
labs(title = "Seasonally adjusted data", y = "Petajoules")
- Change one observation to be an outlier (e.g., add 300 to one observation), and recompute the seasonally adjusted data. What is the effect of the outlier?
- Does it make any difference if the outlier is near the end rather than in the middle of the time series?
gas |>
mutate(Gas = if_else(Quarter == yearquarter("2007Q4"), Gas + 300, Gas)) |>
model(decomp = classical_decomposition(Gas, type = "multiplicative")) |>
components() |>
as_tsibble() |>
autoplot(season_adjust) +
labs(title = "Seasonally adjusted data", y = "Petajoules")
gas |>
mutate(Gas = if_else(Quarter == yearquarter("2010Q2"), Gas + 300, Gas)) |>
model(decomp = classical_decomposition(Gas, type = "multiplicative")) |>
components() |>
as_tsibble() |>
autoplot(season_adjust) +
labs(title = "Seasonally adjusted data", y = "Petajoules")
The seasonally adjusted data now show no seasonality because the outlier is in the part of the data where the trend can’t be estimated.
Figures 3.16 and 3.17 show the result of decomposing the number of persons in the civilian labour force in Australia each month from February 1978 to August 1995.
- Write about 3–5 sentences describing the results of the decomposition. Pay particular attention to the scales of the graphs in making your interpretation.
- Is the recession of 1991/1992 visible in the estimated components?
Yes. The remainder shows a substantial drop during 1991 and 1992 coinciding with the recession.
This exercise uses the
canadian_gasdata (monthly Canadian gas production in billions of cubic metres, January 1960 – February 2005).
- Plot the data using
autoplot(),gg_subseries()andgg_season()to look at the effect of the changing seasonality over time. What do you think is causing it to change so much?
canadian_gas |> autoplot(Volume)
canadian_gas |> gg_subseries(Volume)
canadian_gas |> gg_season(Volume)
- Do an STL decomposition of the data. You will need to choose a seasonal window to allow for the changing shape of the seasonal component.
fit <- canadian_gas |>
model(STL(Volume)) |>
components()
fit
## # A dable: 542 x 7 [1M]
## # Key: .model [1]
## # : Volume = trend + season_year + remainder
## .model Month Volume trend season_year remainder season_adjust
## <chr> <mth> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 STL(Volume) 1960 Jan 1.43 1.08 0.520 -0.172 0.911
## 2 STL(Volume) 1960 Feb 1.31 1.11 0.215 -0.0178 1.09
## 3 STL(Volume) 1960 Mar 1.40 1.13 0.307 -0.0395 1.09
## 4 STL(Volume) 1960 Apr 1.17 1.16 0.0161 -0.00627 1.15
## 5 STL(Volume) 1960 May 1.12 1.18 -0.116 0.0476 1.23
## 6 STL(Volume) 1960 Jun 1.01 1.21 -0.356 0.159 1.37
## 7 STL(Volume) 1960 Jul 0.966 1.23 -0.403 0.136 1.37
## 8 STL(Volume) 1960 Aug 0.977 1.26 -0.349 0.0677 1.33
## 9 STL(Volume) 1960 Sep 1.03 1.28 -0.340 0.0870 1.37
## 10 STL(Volume) 1960 Oct 1.25 1.31 -0.0899 0.0329 1.34
## # ℹ 532 more rows
names(fit)
## [1] ".model" "Month" "Volume" "trend"
## [5] "season_year" "remainder" "season_adjust"
fit |> autoplot()
- How does the seasonal shape change over time? [Hint: Try plotting the seasonal component using
gg_season().]
fit |> gg_season(season_year)
- Can you produce a plausible seasonally adjusted series?
canadian_gas |>
autoplot(Volume) +
autolayer(fit, season_adjust, col = "blue")
- Compare the results with those obtained using SEATS and X11. How are they different?
# remember to load library(seasonal) before attempting this question!
canadian_gas |>
model(X_13ARIMA_SEATS(Volume ~ seats())) |>
components() |>
autoplot()
canadian_gas |>
model(X_13ARIMA_SEATS(Volume ~ x11())) |>
components() |>
autoplot()
Note that SEATS fits a multiplicative decomposition by default, so it is hard to directly compare the results with the other two methods.
The X11 seasonal component is quite similar to the STL seasonal component. Both SEATS and X11 have estimated a more wiggly trend line than STL.
Take home message \ SEATS: multiplicative \ X11:MA